Skip to content

Add Redis telemetry store, split workload spec/status, and introduce state-store metrics#21

Closed
miladhzzzz wants to merge 1 commit into
mainfrom
codex/fix-reconciler-to-optimize-etcd-usage
Closed

Add Redis telemetry store, split workload spec/status, and introduce state-store metrics#21
miladhzzzz wants to merge 1 commit into
mainfrom
codex/fix-reconciler-to-optimize-etcd-usage

Conversation

@miladhzzzz
Copy link
Copy Markdown
Contributor

Motivation

  • Reduce etcd churn for high-frequency reconciliation and event data by using Redis as a fast, ephemeral telemetry store.
  • Improve storage efficiency and clarity by separating workload spec and status persistence to allow independent updates and easier projection.
  • Add observability for state-store operations to monitor where writes are occurring.

Description

  • Added Redis dependency and configuration (REDIS_ADDR, REDIS_PASSWORD, REDIS_DB, REDIS_RECONCILE_TTL, REDIS_EVENT_TTL, REDIS_EVENT_MAX_ENTRIES) in internal/config and go.mod/go.sum updates to include github.com/redis/go-redis/v9 and other transitive modules.
  • Implemented Redis initialization and telemetry helpers in internal/scheduler/redis_store.go, with fallback to etcd when Redis is unavailable, and functions to write reconciliation telemetry and event histories.
  • Split workload persistence into workloads-spec/ and workloads-status/ prefixes by adding workload_projection.go, refactoring saveWorkload to persist spec and status separately, and updating keys/paths in state_store.go and other scheduler methods (e.g. GetWorkloads, GetWorkloadByID, DeleteWorkloadWithContext).
  • Added a Prometheus counter state_store_writes_total and helper IncStateStoreWrite in internal/metrics and incremented it for spec, status, retry, assignment, reconciliation, and event write paths.
  • Enhanced scheduler behavior: initialize Redis in NewScheduler, close Redis in Close, avoid unnecessary writes in UpdateWorkloadStatus, skip empty log writes in UpdateWorkloadLogs, and avoid persisting unchanged metadata in UpdateWorkloadMetadata.
  • Reconciler now attempts to persist high-churn reconciliation metadata to Redis (with TTL) and falls back to etcd if Redis writes fail, and avoids redundant persistence for repeated NoAction results.

Testing

  • No automated tests were executed as part of this change.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec66c23d71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread persys-scheduler/internal/scheduler/scheduler.go
if statusResp != nil && len(statusResp.Kvs) > 0 {
_ = json.Unmarshal(statusResp.Kvs[0].Value, &st)
}
workload := models.Workload{ID: spec.ID, Name: spec.Name, Type: spec.Type, RevisionID: spec.RevisionID, Image: spec.Image, Command: spec.Command, CommandList: spec.CommandList, Compose: spec.Compose, ComposeYAML: spec.ComposeYAML, ProjectName: spec.ProjectName, GitRepo: spec.GitRepo, GitBranch: spec.GitBranch, GitToken: spec.GitToken, EnvVars: spec.EnvVars, Resources: spec.Resources, DesiredState: spec.DesiredState, Labels: spec.Labels, LocalPath: spec.LocalPath, Ports: spec.Ports, Volumes: spec.Volumes, Network: spec.Network, RestartPolicy: spec.RestartPolicy, VM: spec.VM, AssignedNode: st.AssignedNode, NodeID: st.NodeID, Status: st.Status, Logs: st.Logs, Metadata: st.Metadata, Retry: st.Retry, StatusInfo: st.StatusInfo}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve CreatedAt when rebuilding split workloads

When reading a workload from the new split spec/status keys, the reconstruction copies most spec fields but never assigns spec.CreatedAt back to models.Workload.CreatedAt. After any scheduler restart or cache refresh, workloads loaded through this path have a zero creation timestamp, so workloadView omits CreatedAt and callers lose the creation time even though it was written into the spec projection.

Useful? React with 👍 / 👎.

@miladhzzzz miladhzzzz requested a review from parmisk80 May 27, 2026 12:49
@miladhzzzz miladhzzzz closed this Jun 3, 2026
@miladhzzzz miladhzzzz deleted the codex/fix-reconciler-to-optimize-etcd-usage branch June 3, 2026 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant